Elimination of unoccupied state summations 
in ab initio self-energy calculations for large supercells 
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<-h ■ We present a new method for the computation of self-energy corrections in large supercells. It 

eliminates the explicit summation over unoccupied states, and uses an iterative scheme based on 
an expansion of the Green's function around a set of reference energies. This improves the scaling 
of the computational time from the fourth to the third power of the number of atoms for both the 
inverse dielectric matrix and the self-energy, yielding improved efficiency for 8 or more silicon atoms 
per unit cell. 
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The quasiparticle (QP) band structure of a system of 
interacting electrons (the single-particle-like approximate 
qq ■ eigenstates which describe the addition of an electron or 
| a hole) is obtained from solutions of a Schrodinger equa- 
"'j-t ■ tion in which exchange and correlation is described by 
the electron self-energy E. Ab initio calculations of QP 
energies for real solids have been performed since the 
1980s [e.g. generally using two approximations: 

(i) the self-energy is evaluated within Hedin's GW ap- 
proximation dJ, where it is described as the convolution 
q | of the one-particle Green's function G and the screened 
• • ■ Coulomb interaction W, both of which are obtained from 
an initial density-functional-theory (DFT) calculation us- 
ing the local-density approximation (LDA); and (ii) the 
QP energies are evaluated in first-order perturbation the- 
ory, starting from the DFT-LDA eigenvalues and eigen- 
states. Band structures in excellent agreement with ex- 
periments have been obtained in this way for many sys- 
tems, but the applications are at present restricted to 
relatively small basis sets and unit cells: calculating the 
inverse dielectric matrix and the QP energies is com- 
putationally demanding, and scales essentially as N* t> 
the fourth power of the number of atoms in a super- 
cell. The potential applications of ab initio electronic- 
structure calculations are therefore restricted in compar- 
ison with ground-state calculations. 

A recent real-space-imaginary-time approach JEJ scales 
as N% t for the construction of the Green's function and 
as N at for the formation of the dielectric matrix and self- 
energy. However, the method is designed for calculations 
that require the whole self-energy S(r, r', uS) for all r and 
r', and, for feasible system sizes, is less efficient if only 
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a few matrix elements of E are required. A mixed-space 
approach ^ for the dielectric matrix scales as N^ t , but 
does not address the construction of the self-energy. In 
this paper we describe a new approach that yields effi- 
cient calculation of QP energies for supercells of the order 
of 10-100 atoms. 

In the past, considerable progress has been made in the 
calculation of the single-particle time-ordered response 
function Xo, 



o , , 2^,. . (nk>\e-^+ G > r \n>k)(n'k\e^+ G '> r '\nk') 

X GG' 1i w ) = o V/nk' — Jn'kJrB B 1 T~ TB B vT ( L ) 

U *-f (£ nk / - £?„/ k + uj + it] ■ sgn [E n > k - £ nk ' )) 



where k' = k — q, n and n' run over the bands, the 
/„'k are occupation numbers, |nk') are the one-electron 
eigenfunctions calculated, for instance, using the local- 
density approximation, and f2 is the volume of the unit 
cell. Based on the perturbation summation approach of 
Dalgarno and Lewis (t]] , Baroni and co-workers HH have 
designed a Green's-function approach which avoids the 
explicit summation over unoccupied states in (1) for the 
case of static response; their method has recently been 
generalized to the dynamical xo P^j ■ It is hence a natu- 
ral idea to extend those advantages to self-energy calcu- 
lations, in which xo is the main ingredient for the deter- 
mination of the inverse dielectric matrix £Q G ,(q, u>), and 
a second sum over empty states, arising from the Green's 
function G, appears in the expression for the matrix el- 
ements of E. At first sight the matrix elements of E 
are formally similar to xoj especially in a plasmon-pole 
approximation such as that of Ref. |ll[]: essentially, (1) 
has to be modified by some (G, G')-dependent prefac- 
tors, and oj substituted by the plasmon-pole parameters 
wgg (see later). However, two main obstacles hinder 
the extension of the approach to self-energy calculations. 
First, when all matrix elements (G, G') of xo are needed, 
as is the case in QP calculations, the straightforward ap- 
plication of the method proposed in Ref. jl0| to xo still 
yields an N^ t scaling: this is because the method requires 
a matrix inversion, which scales as N% t , for each of the 
N v energy denominators appearing in (1), where N v , the 
number of valence states, is obviously proportional to 
N a t. In the case of (E) the situation is even worse, since 
the number of different energy denominators is propor- 
tional to N„ times the number of different plasmon pole 
frequencies wgc, one for each (G,G') pair. The scaling 
would be in this case N% t \ 

Here we propose an extension of the Green's-function 
technique to the calculation of self-energy corrections 
which maintains its advantages, and moreover has the 
improved scaling of N% t (ignoring log N at contributions) 
for both Cq G ,(q, u>) and (E), together with a favourable 
prefactor. Our approach is based on Taylor expansions 
of the Green's functions, which are shown to converge 
rapidly, keeping the same numerical stability and control- 
lability as the commonly used empty-states summation 
method. We illustrate the performance of our method for 
the example of successively large supercells of bulk sili- 
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con, showing that quasiparticle calculations in the frame- 
work of the standard GW approach f° r systems 
such as point defects or amorphous silicon are made fea- 
sible. 

We start from the Green's-function idea of Ref. [p|JlO| 
for the calculation of xo m (l)j where the solution of the 
linear system 

(-H + E vk , ± lo + z, 7 ) |*± k , q G , J = e^+ G ')'>k') 

(2) 

for the "polarization state" l^^'qCu) allows to 
rewrite xo as 

x G G '(q^) = ^$>,k'|e- i(q+G>r (l* + > + 1*-)) (3) 

k v 

As above, |wk) is the LDA Bloch function e T u Vt k(r), 
k' = k q, and H is the LDA Hamiltonian. 

When id tends to zero, (— H+E v ^±ui + iri)^ 1 diverges, 
as do \^ + ) and l^ - ). However, the sum (3) remains 

finite, since (%, , k ,_ q ,_o' ,> k ') = -(^l^k^q.G-,,)' 
For numerical stability, it is then better to project out 
the valence states from H since the beginning, in a way 
similar to that of ref. ||: we change the operator H 
appearing on the left side of Eq. (2) into H = HP where 
P = 1 — J2 V \ v )( v \' an d we modify the right-hand side of 
Eq. (3) inserting the projector P to the left of both ty + 
and 

We write (2) in reciprocal space as 



(-i?k+G,k+G« + (£rf±^ + »fc) /±,(u,k',q,G',w) =« G -G'(«,k / ) (2b) 



G' 

where / ± (r) = e~* k-r, 5(„ k; G , w (r) is a periodic function 
in r. An LU decomposition (or inversion) of (—H + E v ± 
u) + irj) scales as N 3 with the number of plane waves, N, 
for every valence energy E v , yielding the N* t scaling of 
the "naive" implementation. Then, the solution for each 
right-hand side requires N 2 operations for every G' and 
every u v ^> (or N 2 logN operations for every u v ^> and the 
whole set of G' , when fast Fourier transforms (FFTs) are 
used) . 

Equation (3) becomes 



XGG'(q,^) = ^E [ dr <k'(r)e^ G - r (/ r + («,k',q,G', W ) + /-(«,k',q,G', W )), (3b) 



rk 

which is again computed efficiently using FFTs. Before 
coming to the possible improvements in the calculation of 
Xo, it is useful to look directly at the self-energy matrix 
elements. The main numerical effort lies in the evaluation 
of the correlation contribution, which is 



/ nv ( M u\ - 2?r V- n GC (nk|e-^q+ G )- r |c , k / )( C / k / |e'(q+ G ')- r, |nk) 

( ' C[UJ)1 } |q+ Gllq + G'l ^GG'(q) & - -GG'(q) - E dV ) 

27r n GG , (nk|e-^q+ G )- r |t; / k , )^ / k / |e'(q +G, )- r '|nk) 

+ Mi |q+G||q + G'| WGG'(q) + £ GG '(q) - K'k') 
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where k' = k q, Ugc and ujgg' are the plasmon- 
pole parameters determined by the energy dependence 
of eQ 1 G ,(q, lu), and d (v') denote sums over unoccupied 
(occupied) states. Because of the sum over unoccupied 
states, the first of the two terms is the computation- 
ally demanding one. Contrarily to eq.(l), here there 
is a different energy denominator for each plasmon-pole 
frequency, £>gg'- As pointed out above, performing a 
new LU decomposition (or inversion) for each individ- 
ual (G,G') pair would lead to an N% t scaling, making 
the approach disadvantageous even with respect to the 
traditional method. However, it is known that S(w) is a 
smooth function of lu, which implies that it might also 
be a smooth function of lu — wgc- We observe that 
many of the N 2 different u)gg' have similar values, and 
that lu is typically taken at the DFT eigenvalue of state 
\nk); it is therefore possible to evaluate the contributions 
from different lugg' by means of a Taylor expansion of 
(lu — Qgc — H) around u> — lugg 1 = E 7lt Q — luq, for 
one or more expansion points E n $ and luq in the range 
of interest. Since the width of the energy interval into 
which the E n of interest and the plasmon pole parame- 
ters £jgc are scattered does not grow with the system 
size, the number of expansion points needed - and hence 
the number of matrix inversions - no longer depends on 
N«t. 

Expression (4) also contains possibly divergent con- 
tributions, which could hinder the application of the 
Green's-function approach to the first term: when E c 'k' is 
an unoccupied state, divergences arise for lu — E c iyj = lu, 
hence when lu is at an energy at least cDgc above the 
lowest conduction state. In order to avoid such diver- 
gences, we divide the unoccupied states into two groups: 
the 'low-lying' states with energies from the Fermi energy 
to somewhat beyond the highest quasiparticle energy of 
interest, and the states with higher energies. Then we 
apply the Green's-function trick to the latter states only, 
now including the low-lying states in the projector P, 
making the solution well defined for all the energies in 
the range of interest. 

The Taylor coefficients, i.e. the energy derivatives of 
G(Cu) = (—HP + £ ,„ — a)) -1 , are essentially powers of 
G(luq)- Then, their computation only requires as many 
matrix multiplications as the order of the expansion (typ- 
ically 1 or 2). The contributions of the different orders 
can be evaluated separately, in a CPU time proportional 
to N% t jl2|]. The derivative (9£/<9w), used in calculating 
<n|£(w)|n> as (n\Z(E% FT )\n)+(dE/ ' du)-{tu-E» FT ) (see 
Rcf. |ll|), is also determined by the second power of G, 
and hence does not require any further effort. The only 
term whose evaluation is in principle still proportional to 
N^ t is the summation over the occupied and lowest un- 
occupied states, which is performed explicitly. However, 
the actual number of operations involved is negligible, 
since the number of these low-lying states is small. 

The expansion approach can be used in the same way 
to improve the calculation of xoi 1 ^)- Here there are in 
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principle as many different energies in the denominator 
as valence states. Using the expansion technique, one 
only has to compute (-H + E v q + uj)~ 1 for a number of 
expansion points E v q which is proportional to the width 
of the valence band (independent of system size). In the 
case of xo for silicon, inclusion of orders (i — 1) from to 
3 is sufficient to achieve an accuracy of about 50 meV in 
(£}, using 3 expansion points. 

To demonstrate that the method is well suited to ab 
initio GW calculations for large unit cells with many elec- 
trons, we have computed the GW-corrected electronic 
structure for different supercells of bulk silicon, using the 
r point only. In Table I, we illustrate for the 2-atom cell 
the good convergence of single elements of e _1 with the 
order of the Taylor expansion. "Exact" results are ob- 
tained by using the traditional method, with 181 plane 
waves (which corresponds to the 12.5 Ry energy cutoff 
used in the DFT calculation) and the whole 181 states. 
Table II shows the results for the matrix elements of S 
as a function of the order of its Taylor expansions (using 
a 3rd order expansion for e _1 throughout). We compare 
with the exact results and with the approximate results 
which are obtained with the traditional approach, using 
169 plane waves and 112 states. The 'traditional' results 
have an accuracy of 50 meV, and, with our method, in- 
clusion of orders 0, 1 and 2 are seen to be sufficient to 
achieve the same accuracy. 

The computational time as a function of the supercell 
size is illustrated in Fig. 1. To allow a fair comparison, 
both calculations have been performed with the param- 
eters leading to an accuracy of about 50 meV. The two 
approaches are already equivalent for an 8-atom silicon 
supercell. With 54 atoms the new method gains a factor 
of five in computing (£); here the CPU time required to 
compute the dielectric function, plus £ and its energy 
derivative for all 66 states lying within 0.1 Hartrees of 
the Fermi energy is 42 hours on a Cray C98 computer, 
22 hours of which is for xo- This means that quasi- 
particle calculations in the framework of the standard 
GW approximation for systems such as point defects or 
amorphous silicon are feasible with a reasonable compu- 
tational effort. We have already successfully applied the 
present approach to the calculation of xo and (£) for 
sodium clusters in a large supercell Q |. 

A possible improvement, which will reduce the com- 
putational effort further, is to exploit the fact that in 
our scheme, by analogy with what is pointed out in 
Ref. || and [pi, we compute quantities which are short- 
ranged in (r — r'). Introducing a real-space cutoff in 
(— HP + ui — a)) -1 will reduce the computational expense 
significantly (at present, without the cutoff, FFTs ac- 
count for more than half of the total CPU time.) 

In summary, we have introduced a scheme which ex- 
tends towards larger sizes and complexity the set of phys- 
ical systems for which self-energy-corrected electronic 
structure can be computed, within Hcdin's GW scheme 
and the plasmon-pole approximation. By avoiding the 
need for an explicit summation over conduction states, 
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and introducing an iterative scheme to describe the en- 
ergy dependence of the Green's function by expanding 
around a few reference energies, the method reduces the 
scaling of the computational time from N* t to N% t . Sys- 
tems with a number of atoms N at of the order of 50, 
which would require a prohibitive computational effort 
within traditional GW schemes, thereby become acces- 
sible. The present method appears to be a promising 
tool for the study of complex structures such as clusters, 
reconstructed surfaces, and point defects in semiconduc- 
tors, since they often fall within this class of systems. 

This work was supported in part by the European 
Community programme "Human Capital and Mobility" 
through Contract No. ERB CHRX CT930337. Com- 
puter time on the Cray C98 was granted by IDRIS 
(Project No. CP9/950544). 
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TABLE I. Typical convergence of (G, G') elements of e 1 with the order of the Taylor expansion (see text). 
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TABLE II. Quasiparticle corrections to the LDA eigenval- 
ues for bulk silicon, calculated with the present method for 
different orders of the Taylor expansion for E (columns 1, 2 
and 3), and with the traditional method summing over all 
the conduction states (for the chosen plane-wave basis set) 
(column 4), and summing over about 2/3 of the conduction 
states (column 5). Values in eV. 
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FIG. 1. CPU time required for the calculation of the GW 
corrections to the LDA electronic structure, as a function of 
the number of atoms in the supercell. Filled diamonds are for 
the traditional method, hollow squares for the present scheme. 
In the inset, a log- log plot of the same variables shows the 
cross-over at the 8 atom supercell. 
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